Picture for Haitao Mi

Haitao Mi

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Add code
May 14, 2026
Viaarxiv icon

Reinforcing Multimodal Reasoning Against Visual Degradation

Add code
May 10, 2026
Viaarxiv icon

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Add code
May 10, 2026
Viaarxiv icon

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Add code
Apr 23, 2026
Viaarxiv icon

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

Add code
Apr 20, 2026
Viaarxiv icon

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

Add code
Apr 20, 2026
Viaarxiv icon

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

Add code
Feb 12, 2026
Viaarxiv icon

Free(): Learning to Forget in Malloc-Only Reasoning Models

Add code
Feb 08, 2026
Viaarxiv icon

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Add code
Feb 04, 2026
Viaarxiv icon

Verified Critical Step Optimization for LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon